-
Notifications
You must be signed in to change notification settings - Fork 2.3k
vttablet: add proxy support to FullStatus RPC in tabletmanager
#19058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
vttablet: add proxy support to FullStatus RPC in tabletmanager
#19058
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Tim Vaillancourt <[email protected]>
6fb2987 to
6da5c3e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #19058 +/- ##
==========================================
- Coverage 69.89% 69.88% -0.02%
==========================================
Files 1612 1613 +1
Lines 215826 216058 +232
==========================================
+ Hits 150857 150997 +140
- Misses 64969 65061 +92 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
|
|
||
| func (s *server) proxyFullStatus(ctx context.Context, request *tabletmanagerdatapb.FullStatusRequest) (*replicationdatapb.FullStatus, error) { | ||
| if s.tmc == nil { | ||
| return nil, vterrors.New(vtrpcpb.Code_FAILED_PRECONDITION, "no proxy tabletmanger client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init() func of this package creates the *server with a tmclient so this situation is kind of impossible
| // proxy_timeout_ms specifies the maximum number of milliseconds to wait for a | ||
| // proxied request to complete. Must be less than topo.RemoteOperationTimeout | ||
| // on the tablet proxying the request. | ||
| uint64 proxy_timeout_ms = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this not be a vttime.Duration?
| // disallow timeouts larger than the local remote operation timeout | ||
| if request.ProxyTimeoutMs > uint64(topo.RemoteOperationTimeout.Milliseconds()) { | ||
| return nil, vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "cannot set a proxy timeout ms greater than %d", topo.RemoteOperationTimeout.Milliseconds()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure of the value of allowing the caller to specify a timeout rather than using ~ the topo.RemoteOperationTimeout / 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK to me, but in general I do not like intentionally adding dead/unused code. We are supposed to remove dead/unused code. 🙂
I would like to see you use this new work, in vtorc, in the same PR that you create the RPC changes. This does the following:
- Avoids new dead code
- Ensures that the underlying building blocks (RPC work in this case) actually work as needed in order to meet the larger feature/product goal. Without that we should have no confidence that this is what we need, and having to refactor it later will make things harder. Or maybe we don't even end up using it for a variety of reasons (see point 1).
Can you please implement the actual feature that you wish to add to Vitess in this PR along with the lower level RPC changes and anything else that is needed in order to implement the feature request? Then we can understand the problem, the reasons why this is an optimal solution, and confirm/demonstrate that it actually works as desired and solves the problem.
Thanks!
P.S. I think the general issue that we are trying to address here (which I have not seen laid out so I'm not sure) would typically be resolved with something like a gossip protocol. The caller having to decide if and when to proxy a request, and where to proxy it to, doesn't immediately feel to me like the best solution. But then again, I also don't see where we've clearly described the problem we want to solve and why we think this is the best solution. 🙂
Description
Add support for proxying

FullStatusRPCs, in order for VTOrc to gain the ability to validate certain problems using many cells (locations) and detect network partitions - this will happen in future PRs (possibly v24), for now having this RPC in v24 will be helpfulSupport for this is intentionally not added to
vtctldclient GetFullStatus, because it's intended for internal VTOrc usage and I can't think of a good use case for someone at a CLI to need thisTo add some safety, a request can only be proxied once (the
ProxiedByflag is added by the proxying tmserver) and you cannot proxy to "yourself", which would cause an infinite loop. Finally, a proxy timeout > the remote operation timeout of the proxying server returns an error. e2e tests are added to confirm these safety nets and that the proxying succeedsRelated Issue(s)
Checklist
Deployment Notes
AI Disclosure